Evaluating automatic syllabification algorithms for English
نویسندگان
چکیده
• The three lexical databases • 18,016 words were both found in the Webster’s Pocket Dictionary and the Wordsmyth English Dictionary-Thesaurus. • These 2 independent dictionaries, each consisting of 18,016 syllabified entries, are referred as S&R and Wordsmyth, respectively. • A third database, Intersection, was derived consisting of the 13,594 words in the two above independent dictionaries with identical syllabification patterns. Institute for Biodiagnostics
منابع مشابه
Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
We present the first English syllabification system to improve the accuracy of letter-tophoneme conversion. We propose a novel discriminative approach to automatic syllabification based on structured SVMs. In comparison with a state-of-the-art syllabification system, we reduce the syllabification word error rate for English by 33%. Our approach also performs well on other languages, comparing f...
متن کاملSyllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian
Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-b...
متن کاملAutomatic syllabification in English: a comparison of different algorithms.
Automatic syllabification of words is challenging, not least because the syllable is not easy to define precisely. Consequently, no accepted standard algorithm for automatic syllabification exists. There are two broad approaches: rule-based and data-driven. The rule-based method effectively embodies some theoretical position regarding the syllable, whereas the data-driven paradigm tries to infe...
متن کاملAre rule-based syllabification methods adequate for languages with low syllabic complexity? the case of Italian
Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-d...
متن کاملAutomatic word stress marking and syllabification for Catalan TTS
Stress and syllabification are essential attributes for several components in text-to speech (TTS) systems. They are responsible for improving grapheme-to-phoneme conversion rules and for enhancing the synthetic intelligibility, since stress and syllable are key units in prosody prediction. This paper presents three linguistically rule-based automatic algorithms for Catalan text-to-speech conve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007